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BY XIAOMH ZHANG, 
ft€WO FlfDUR, AND 
MICHAaPOTOVICH 



A BiointelHgence System 
for Identifying Potential 
Disease Outbreaks 

Monitoring Overthe-Counter Pharmaceutical Sates 
Data as an Indicator of Changes in Public Health Status 



It ean reasonably be expected thm consumer xpending pat- 
terns, such a& purchaser of ovcr-thc-coumer pharmaceuti- 
cals, hold a strong relationship to the public health status, 
possibly indicating status changes earlier rhan traditional 
public health infomuuion systems, tn the light of events over 
the JasJt two years, such as toe anthrax scare, the SARS out- 
break, and other public health events, ovcr-ihc-covnter fOTC) 
data collection has received M^rifieam attention. Most OTC 
projects focus on data colliection. and basic statistical analysis. 
However, many questions have yei to be answered. For exam- 
ple: How does the OTC sales data link to the public health .sta- 
tus and how can this be described in a rigorous framework? 
How can potential disease outbreaks be detected within the 
noise of real-world OTC data? Syndromic sumrfllsmce has 
come to the attention of public health only recently, so how 
tan its rcMihs.be imcgnwed and validated within Hie existing 
body of public health knowledge? Overall, there exists no sys- 
tematic framework for the potentially automated collection 
and anafy»i> of OTC data for public health managcrnent . con- 
sumption. This area remains largely unexplored. Small -sam- 
pling sr/cs, significant flueniations Within OTC sales data, and 
the tack of evidentiary information confirming it make OTC 
sales surveillance systems challenging and difficult. These dif- 
ficulties arc magnified in that a biointelligencc system (BIS) 
jnustt becoroc a component of the established public hearth 
Information system infrastructure, while, requiring an end-to- 
end implementation of advanced information technologies and 
a near nal-iime execution. 

The development of public hcallh surveillance systems 
require* mull idisviplinary knowledge and advanced technolo- 
gies. Ha Iperio. and Baker 11] provided ^n excellent summary 
on public health survcillaiice systems; and the Center* for 
Disease Control and Prevention 12], [3] developed new guides 
tines and feeonimendaiiorts for the evaluation of public Tiealth 
surveillance systems. 

*The lessons learnt from the events following September 
J 1, 200 1, and the subsequent Anthrax attaek* have proven 
that new and innovative technologies resources arc 
absolutely necessary to ensure the nation is fully prepared" 
|4|. To that end. Scientific Technologies Corp, has devel- 
oped the NH Pharmaceutical Sales Surveillance (NHPSS) 



for the New Hampshire Department of Health and Human 
Services (NH DHHS), The NHPSS whs developed as a dis- 
tributed information system. In addition to the database 
server, enterprise application servers and the Web-browBcr- 
bttsed user interface, its architecture features knowledge- 
base technology, a new dynamic system model with rule 
systems, and automated data analysis in supporting public 
health Burvcillonce, Internet mapping was also embedded In 
the system to provide for spatial analysis. Since its pilot 
application, starred in December of 2002 in the Bureau of 
Communicable Disease Control and Surveillance, NH 
DHHS* the NHPSS has assisted NH DHH5 in successful 
detections of -gastrointestinal and respiratory events. 

This article first introduces the methodology and the tech- 
nology in the NHPSS development Next the system func- 
tionalities are inUDoduced-.Theo, a new dynamic system model 
with a rule system for public health surveiflancc is described 
in detail. Finally its application and preliminary results are 
summarized. 

MettK>dok>gfrw in Devtioprnftnt of NHPSS 

The mulDdisciplinary o^v^lopmcnt of NHPSS draws from five 
system domains, ^oing beyond realtime data compilation: 

1 ) information lecrmology (connectivity); 

2) data models <adi^uTO 

3) knc^edg<>basc <G^rnain knowledge and data derived 
knowledge); 

4) ajnaly deal methods <dy namic model and algorithms in 
automated processing): . 

55 enterprise applications, 

Bach of these domain* plays an Important role, and a sysren*- 
attc integration has been achieved in NHPSS^ This article 
focuses on the data model, knowledgebase technology, and i 
analytical methods developed in the NHPSS, 

Massive OTC daily sales data are collected at.the pharmacy i 
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stores. These sales can he cwegoriJxd imo medications treat* 
inji respiratory-related syndromes, gusu^imcstina)*rc)aicd- 
syndromes, allergy syndromes, etc,, according to their active 
ibgrcdicnts. The categorl/ad ^les data at a store reflect iltc 
public health statu*, in this category, around thai, geographi- 
cal area. The sales amount is furthermore impacted by th? 
population In thai area and the convenience to access: the ncifr 
vice (both hospital service and pharmacy service). After ilie 
purcluiMi, a custorncr van take the medicine for several days. 
These fetors and the spatial and temporal variations in the 
OTC Hales have been quantitatively reflected in the N HPS S" 
data model iu»d analytical process. To identify a potential 
outbreak, si is necessary to establish a set of reference lines 
for OTC sales* To that end. a tneasurctrient schen*: has to be 
defined first. In the NKPSS, the geographical units are 
defined a* the More service area, zip code area, city, and 
statewide area. In each geographical unit, the underlying 
population da i a (possibly with the age groups) can be 
abstracted from census data. A store'* service area i% derived 
From the driving distance between the store and its potential 
easterners' home*. Time units u«ed are dally, weekly; and 
monthly. Itata processing and knowledge acquisition will tie 
discussed in the following section. 

fcom fa* Oata to SpatkaDatci warvtou** 

After replication of the dairy OTC sales data, the raw data ore 
automatically processed ulon g .spatial and temporal ditnen* 
stons, In eonjunclibir.wtth a .rule base, basic siaiwiicaJ methods 
have been applied here to derive the reference lines. The 
developed system requires a mininmm of one- month historical 
data, while U Is recommended that more, than one year of daw 
is available to improve the confidence set. Let* be the sales 
amount of a categorized methane for 2 lime unit (e.g.. daily), 
This amount wlO be compos 10 reference lines month- 
ly) derived from historical sales records. of the prior wycars, 
the; specified category syndrome^ and the geographical unit. 
The reference lines. include a base line representing the regular 
daily atnount and upper reference lines imwporaiing the con- 
fidenct levels. Since the reference; lines arc computed period** 
cally at each geographical level", the seasonal variations are 
maintained and the spatial char^cterfsilcs are captured, 

The rule sys tem. Integra led with ihe data warehouse 
approach, adapts the automated data processing and handles 
(he exceptions. Two p^'ble special <a*es have. been consid- 
ered; a) an epidemic outbreak was recorded in the 
history of this place; and b) mere may be less 
than one year, of historical data. The data ware- 
House ^Atriacs- the seasonal vw-ying reference 
lincs al each geographical level. 

lik worth noticing that a CIS tool was aJso 
integrated with the developed spatial data ware- 
house. The, 01$ derive* and organizes the. spa- 
tial background information, which includes the 
population with 09c groups. It also performs the 
spatial analysis 

tfrst, a rncasurement sclwavw was defined to 
qualitatively and qualitatively evaluate the devi- 
ation of the inocaning daily data from the refer- 
ence line {10= identify the possible aboomtaliry) at 
each place. Next a set of algorithms for a struc- 
tural component unalysis has been developed. 
The incoming OTC data arc transformed into the 



structural components. A mapping of tlK structural compo- 
nents imo the dynamic system model describes the change of 
public health status and identifies the possible unusual events. 

A Dynamic Model to* Jh* Puttie HtKrth Stafu* 

The dynamic process model in 'the state-space form was syv- 
temutically formalized by Kalmaa Mb, and Arbib (1963) [51 
Rosenbrock (1970}'(6| expended multivariate systems in a 
state-space form. Since then, mathematical system theory, 
modern control engineering, and computer technology have 
enabled extensive successes in several industries However, 
nonlinear systems in a watt-space form are much less well 
understood. In public health, Castillo-Chavez, et&l. 20(12 171 
state thut "the basic epidemiological equations are sufficiently 
nonlinear" The dynamic system model in stale space as a 
means, to describe this complexity has not been mentioned,, as 
almost, nothing Is known about them in the public heal in con- 
twti This contrasts strongly with the commonly deployed 
intensive siotistical approaches and stochastic cpidciriic mod- 
eling. A knowledge base with Tuk system can -significantly 
improve decision-making support, because a large class of 
nonlinear functions can be described there, and a priori knowl- 
edge caii be fomiuiixcd from domain' experts or derived from 
the data. Our effort has been to develop an integrated (System 
that would embrace dynamic system theory, statistical meth- 
od*, and a rote system with knowledge-base techniques as a 
unified .tool. It is oriented toward problem solving- in syn- 
dronric surveillance, but it b abwmtc^tgepeeirttmewbrk with 
many possible getufralfaaiions and implementations. The 
developed system can be iirtpleirantetf! in ^relatively short 
time, as demonstrated in the case of NHPSS, 

Figure 1 shows the defined states: and state transition dia- 
gram. The dynamic change of die public health status is mod- 
eled here in a new state-space foiTn. This state-space form 
differs from the conventional state space approaches in that 
here the state transiting input mapping* nod output mapping 
are governed by the rule system, while the conventional state* 
space form uses crisp algebra of linear algebra in inost cases. 
With a state-space notation, at a. Specified place, the. catego- 
rized public- health status is explicitly modeled by. a sot of state 
variables, which arc varying oyer time. /Defined by this model, 
in a specified place, at a speciOcjUme, a categorized health sta- 
tus, is one of the.foUbwingi healthy status critical status 
'<&). suirting^uousual status (&). upwrniktremiHinusual status 




ffo 1 , The stotw cr»d stote lioneltlon cSaerom of the puWc bocfltn sfgnjs. 
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knowledge-base techniques as a unified tool. 



C5 U ). peak-unusual statu? (Sp), downwind-oxrid-unusual status 
(6V>. and endin^urtusual status <S f ). The state transUioiis over 
time reflect the dynamic change qf the public health status. 

The state space 5 is defined with Us Male variables 
(5: 5*. $ tf ,5„. 5 y . Sj> 5,) A validated state transition from 
State to slate iSyOtHh. 1> determined by the rule system, 
that openues in relational algebra oo its supporting «t 'Xtfc\ 
The validated transition from stale S;{k) to state 5/A 4- 1) h 
o^tconmcd by a rule base rty, which evaJuae* toe- inputs 
at state 

Sj(k+ I) (5i(fc).<8 ftj®Xf(fe).Si<fl « AvhvW) 0) 

«<*) ^m(a(A). f3(*M<*j) ® w^*} (2) 

Li*.(»J 



L$(*-»)J 



where 



(?) 



(4) 



. K?3= (/<*• 7;*, r\*)* 



(6) 



In the defined system equations* time are threc.tran&formcd 
cornporients, aW*)« vv.„(*M. which are cfcrived fcom 

the incenmrig raw data, am) then mapped into the supporting 
set. As^me ; advsinee» v (for example, a lime unit can be daily), 
the state transition from stale 5j(*> to Mate SjLk -h I) Sk defer- 
mined by the rote system Rj.,.. which evaluates. the supporting 
set X t (k% a» shown in (i)» where statute for the Merence 
operation, w a rule system operation, which can be logical 
operations or algebra operations or a hybrid* Equation { 0 also 
defines ^quantitative measure meet for stare Sj(fd in that cat- 
egory at the specified place. The coefficient Jtv can be defined 
by the nxer.CT it can be eclated to a threshold valoe obtained 
from the historical .data set* the Uutcr approach was taken in 
the NHPSS iiru>]crdcntalion to* enable automated application 
of very gran ular rules without subjective input by experts. 

Equation (2) describes that, at a state Siik), there is the sup- 
porting set Xj<fc) with three structural components who*e 



respective thresholds |tr(*),0(*Mf*)l can he incorporated. 
The rule system ty> maps the components into the supporting 

Equation (3) describes the output mapping, which interprets 
the outputs from a set of state* or a state hisiory whh the spec- 

iflcd weight for the- state* by (yaW.ytC*) Ofo(*)j. In 

addition, the mJc system combines the background infomia- 
lion G h such as the environmental factor* whh the peculation 
demographics in the study area. 

EqualUm (4) and (5) define the supporting system X as an 
aUdiiivc coinbinatkm of supporting set?, thus allowing inputs 
to be comprised of multiple data sources, \x\ the NHPSS- eval- 
uation period, tkut source* included the <TrC sales data, emer- 
gency departmcm. encounters, school closure events, acd case 
count data. 

Equation {6) defines that the value of an output is ^combi- 
nation of the likdihood index of ahnorrnality {Uhh the trend 
indicator (T^), and the poicotial impact index An 
exemplary sd has been defined here as: 

[Un : (low, uiediunt, high)}, 

{Tit • upward, downward)K 

|P Vl : (minor, moderate, sinificam)). 

Consider the sample case wJkto y { m im^^). Thi$ case 
rc^resoKs a n>cdium likelihood tttac*matiiy. with upward trend 
suMus, and possible significant potential impact In reality, this 
situauoa mi^i^tt^^ecined extensive majugancju. 

A knowledge base could derive its iirrorraation from data 
sets, fcusc I and Studcr (1999) fSi provide a comprehensive 
description on ihe application of knowledge acquisition and 
management. In NHPSS, a knowledge base compiles the 
incoming raw data into. the designed rorrns, Next, it derives 
the relational facts, temporal characteristics, and region til pat- 
tern*. Data processing methods include statistical analysis 
over space and time. The knowledge base organizes the - to for- 
mation such thai queries or evaluations posed to the knowl- 
edge base can be answered by means of an inference-based: 
query -then-answering operation, or alternatively, an automat- 
ed operation on evaluation and response*.. During the devcl- 
opmatt of NHPSS, Gl$ was Jmcgrated with the knowledge 
base. With knowledge-base techkkology, large data sets are 
processed along temporal and spatial dimensions, 
Information is derived to characterize the spatial distribution, 
over time. Tbca,rhc tfpatial data warehouse organizes the 
knowledge and its derivative information hierarchically by 
spatial areas. Figure 2 illustrates a spafiotemjiora! knowledge 
acquisition by deriving die seasonally varying reference lines, 
the trends, ihc extreme values, and the clusters, for i)ie CXfC 
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sales in geographical dimensions for the categorized disease 
syndromes in a local area incorporttfing regional and stale- 
wide information. 

A rule system Lm a set of rules, arguinente* constraints, rela- 
tions, and responses. A rule can be numerical » logical, or both. 
A hybrid rule system consists of boih explicit functions and 
logical rules. Sardoasy und Duckstein (I995)f9| have ait 
exec I lent- introduction to rule systems, The rule system kn 
NHPSS is a hybrid rule system that was developed for auto- 
mated operations of the- BIS. The role system is implemented 
with a set of decision matrices. The developed rule System 
consists of sots of logical rules. Combining statistical analyst* 
and epidemiology knowledge, the' rule system evaluate* the 
decision matrices and produces the 
responses. Examples are the compari- 
son of the incoming data with the set of 
references. Differences arc then quanti- 
tatively imd qualitatively evaluated with 
respect: to the space-time dimensions. 
The abnormalities oF die OTC medicine 
sales are identified and assessed using 
relational algebra, relational calculus, 
and classical calculus. Figure 3 ill us- 
crates the rule system approach. Figure 
4 show* the integration of the spatial 
data warehouse knowledge-base tech- 
niques with rule system to support the 
automated tmaly sis and reporting. 



ImptofYMfitertlori of NHPSS 

NHPSS was implemented as a distrib- 
uted information system. Figure 5 
shows che NHPSS system structure. 
Daily OTC pharmaceutic oJ sales data 
ore collected tit. each stow, recorded at 
the pharmacy chain headquarters, and 
transmitted to Nil DHHS. These data 
are replicated in data servers at the. state 
public health department The devel- 
oped BIS data warehouse organizes 
data along logical dimensions, Next, 
automated data processing occurs in the 
application scovers. Finally, analyse^ 
reports, and alerts (if necessary) *re 
generated to assist the decision making 
process of public health . roanogetn eat. 
The user wterface i$ {«r«rott browser- 
based. With secured access, users can . 
browse the data, search the reports and 
maps, and review the results of the 
irend analyses. and unusual event detec- 
tion methods as created by the buttr4n 
rule-base. 

CIS plays a key role id the BIS spa- 
tial data warehouse with; Knowledge 
acquisition as was introduced above, It 
also performs, the spatial analysis such 
as abnormality analyses wkh sennning 
and ranking* Furthermore* CIS sup- 
ports the outputs mapping for risk 
assessment as well as provides com- 
prehensive reports with possible alerts. 



Figure 6 depicts the integration of CIS for knowledge acqut* 
siiion and OTC analysis in NHPSS. 

The main syndromic surveiltanuc functions of the NHPSS 
BIS implementation can be summarised as: 

(I) Aurotnateti data capture of OTC phannaceutictil natex 

<faw: 

v Approximately 300 different pbarmaceHitical items arc cur- 
rently categorized into Gastrointestinal Disceies and 
Respiratory illnesses. 

> The measurement mitt of disease indicators can be the daily 
number of sold packages (NHPSS), or the amount of sold 
active ingredient 




Rg. 2, ^pcjnotemporol knowledge ocqiilsitlon scharrio m NHPSS. 




Rp, 3. Illustration ottheruje 5V3Terr in NHPSS. 
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(2) Automated processing for data^krived reference tines 
(from historical data): 

> Centra) Line: month ly-averaged ("or weekly -averaged) 
daily sales 

v Control line*: Mm., Max., N-sigma line* and Confidence 
interval Upper Limits; 

{3 )Awlysisaxd Reporting tn Time Di/Hcnslttns; 

> Detailed or aggregated reporting in dailyAveeMy/momhly 
for (he selected place, with capability of comparison to- the 
historical data. 

(4) Analysis and Reporting In Geogmphical Areas: 




Ffe. 4. integration of spatial data warehouse* knowledge teM»,.:ond eyrtem j n 
NHPSS, 
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> Map displuy with alerting capability for the specified time 
and disc*!* indicator 

Pinpoint the unusual Areas. 

) Rulerbased Trend Analysis and Event Detections: 

> detection of an unusiial skgle*poim-valuc event hy com- | 
parison to the control Line* i 

* detection of cluslersand early warning of dustcr-druling j 
x detection m6 early warning of weekly average tfufting 

> detection and early warning of potential trend shining I 

> detection of start ing date, peak, and ending date of an event j 
Figure 7 shows NHPSS hierarchiCHl decision support slep ?j 

hy step* from a time series alert at ihe state level to pinpoint- 
ing the unusual local areas and its 
detailed reports. 



Pilot Application of NHPS5 

The pilot jq^icauon of the NHPSS BIS 
started in December 2002 at the Bureau 
of Communicable Disease Control and 
Surveillance (BCDCS), NH DHMS. 
Pharmacy stores m 23 dlie* reported 
dai iy OTC sales for a select set of pbar- 
maceuticab to NH DHHS. The! po/iici' 
paling pharmacy stores, represent 
approxiiiwteiy 10% of all stores in NH 
statewide and about 30% in the major 
cities. Since then, NHPSS lias success 
fully supported BCDCS in detecting a 
large-scale gAs&wntcstfnol disease out- 
break at ihe end of 2002 at both state 
and local levels, and a major influenza 
outbreak in February of 2003, The 
NHPSS output was compared to hospt~ 
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Hq. 5. Otogrom of tf>e WP5$ system structure. 
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ttii ER counl*. cose counts, and laboratory infrntnauon* ami 
some school closure event*. All doui sources weir at ibe same 
time period in the same area. The hospital BR data coven 
about 60% or total ER veiling in New Hampshire «aiewkki. 
For the gasowntestinaJ disease outbreak, the NHPSS iriggerod 
an ekkn four day* prior to the recognition of the outbreak at the 
state level In focal areas, such as communities, it ha* provided 
alert* up (0 ten days early. Furthermore, for some cities, 
NHPSS has provided alerts up 10 12 days prior to the closure 
of school*, cnmhttiing the spread of the influenza outbreak. 
Equally iroportwtf i» that the spatial And temporal characterise 
tic* of the outbreaks can he reported by the embedded Trvcmci 
CIS application. Figure $ disptays ihe analyst ft recoils of 
NHPSS outputs, with GIS tool scanning and ranking thm 
abnormality in support of risk assessment Several other fcss 
impact ive localized and ivtate-wide event* have been detected 
and described by the NHPSS BIS. 

Conclusion* 

Developing, syndromic surveillance Hystems u> support early 
warning of bolh natural and possibly irtera^al.public health 
events has received significant attention In the past two years. 
This article has urWHbccd « NointeUigcrice system that -inn> 
grate* advanced infewiaUoh wclinology with dynamic-system 
theory to support the early detection of potential dbeasc out- 
breaks. A new dynamic model was developed in the state- 
space form to describe the change of public health status 
incorporating mukipJ* sources of syiKtroinic surveillance data. 
A Tneattiremenl scheme for the spatially and tcnipomlry vary* 
tag time series data was developed. Spatial data warehouse 
and knowledge-base techni^e* are integrated with automated 
data processing And automated knowledge acquisition. A 
eoaipafuoa.ruk system governs the state traitsittoju and Sup- 
ports Ihe event detection. The implemented syntonic NHPSS. 
has Internet OlS to support spatial analysis and dccUiorwnafc- 
iog, as well as a coiwt^i^o0^tte>*hclf Wcb-ha>cd report* 
iag tool. The pilot application has yielded promising results. 
Beyond ihe merits of the developed system for the OTC sales 
surveilbncc the framework has been generalized for multiple 
data sources. This system is still in the early stage* but the pre- 
liminary results already start to chal- 
lenge the: traditional methods in their 
own area of excellence or where they 
cannot be applied. 

NHPSS; has demonstrated that public 
health rnanngement can greatly benefit 
from early warnings for disease out- 
breaks through the impten^Hoiion of 
atktomated syndromic surveillance. The 
largest remaining challenge is Ihe coop- 
eration of multiple organizations and 
the private sectors in sharing the infor- 
mation for the common goal while pre- 
serving confidentiality ot proprietary 
interest* such as market pcncmuuxi by 
irkliv idual phannacy chains, 
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